Gaussian Statistics and Unsupervised Learning 1
نویسندگان
چکیده
This tutorial presents the properties of the Gaussian probability density function. Subsequently, supervised and unsupervised pattern recognition methods are treated. Supervised classification algorithms are based on a labeled data set. The knowledge about the class membership of this training data set is used for the classification of new samples. Unsupervised learning methods establish clusters from an unlabeled training data set. Clustering algorithms such as the K-means, the EM (expectation-maximization) algorithm, and the Viterbi-EM algorithm are presented. Usage To make full use of this tutorial you have to 1. download the file Gaussian.zip which contains this tutorial and the accompanying Matlab programs. 2. Unzip Gaussian.zip which will generate a subdirectory named Gaussian/matlab where you can find all the Matlab programs. 3. Add the path Gaussian/matlab to the matlab search path, for example with a command like addpath(’C:\Work\Gaussian\matlab’) if you are using aWindows machine, or by using a command like addpath(’/home/jack/Gaussian/matlab’) if you are on a Unix/Linux machine. Sources This tutorial is based on • EPFL lab notes “Introduction to Gaussian Statistics and Statistical Pattern Recognition” by Hervé Bourlard, Sacha Krstulović, and Mathew Magimai-Doss. 1 Gaussian statistics 1.1 Samples from a Gaussian density Useful formulas and definitions: • The Gaussian probability density function (pdf) for the d-dimensional random variable x N (μ,Σ) (i.e., variable x ∈ R following the Gaussian, or Normal, probability law) is given by: g(μ,Σ)(x) = 1 √ 2π d√ det (Σ) e− 1 2 (x−μ) TΣ−1(x−μ) (1) where μ is the mean vector and Σ is the covariance matrix. μ and Σ are the parameters of the Gaussian distribution. • The mean vector μ contains the mean values of each dimension, μi = E(xi), with E(x) being the expected value of x. 1http://www.igi.tugraz.at/lehre/CI/tutorials/Gaussian.zip
منابع مشابه
STATS 306B Methods for Applied Statistics: Unsupervised Learning
(a) Implement the EM algorithm for an HMM with hidden states zt ∈ {1, . . . , k} (for any k > 1) and isotropic Gaussian emission probabilities, p(xt|zt), for xt ∈ Rd (d ≥ 1). That is, xt|zt = j ∼ N (μj , σ2 j I) for unknown parameters (μj , σ2 j ). Do not use a pre-existing implementation. Note: The α and β recursions involve the repeated multiplication of small numbers and hence are susceptibl...
متن کاملThe Minimum Transfer Cost Principle for Model-Order Selection
The goal of model-order selection is to select a model variant that generalizes best from training data to unseen test data. In unsupervised learning without any labels, the computation of the generalization error of a solution poses a conceptual problem which we address in this paper. We formulate the principle of “minimum transfer costs” for model-order selection. This principle renders the c...
متن کاملKernel discriminant analysis and clustering with parsimonious Gaussian process models
This work presents a family of parsimonious Gaussian process models which allow to build, from a finite sample, a model-based classifier in an infinite dimensional space. The proposed parsimonious models are obtained by constraining the eigendecomposition of the Gaussian processes modeling each class. This allows in particular to use non-linear mapping functions which project the observations i...
متن کاملUnsupervised Learning for Hierarchical Clustering Using Statistical Information
This paper proposes a novel hierarchical clustering method that can classify given data without specified knowledge of the number of classes. In this method, at each node of a hierarchical classification tree, log-linearized Gaussian mixture networks [2] are utilized as classifiers to divide data into two subclasses based on statistical information, which are then classified into secondary subc...
متن کاملHigh-Dimensional Unsupervised Active Learning Method
In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...
متن کامل